Using Human Demonstrations to Improve Reinforcement Learning
نویسندگان
چکیده
This work introduces Human-Agent Transfer (HAT), an algorithm that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations transferred into a baseline policy for an agent and refined using reinforcement learning significantly improve both learning time and policy performance. Our evaluation compares three algorithmic approaches to incorporating demonstration rule summaries into transfer learning, and studies the impact of demonstration quality and quantity. Our results show that all three transfer methods lead to statistically significant improvement in performance over learning without demonstration.
منابع مشابه
Integrating reinforcement learning with human demonstrations of varying ability
This work introduces Human-Agent Transfer (HAT), an algorithm that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations transferred into a baseline policy for an agent and refined using reinforcement learning sig...
متن کاملIntegrating Human Demonstration and Reinforcement Learning: Initial Results in Human-Agent Transfer
This work introduces Human-Agent Transfer (HAT), a method that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations can be transferred into a baseline policy for an agent, and reinforcement learning can be used t...
متن کاملInverse Reinforce Learning with Nonparametric Behavior Clustering
Inverse Reinforcement Learning (IRL) is the task of learning a single reward function given a Markov Decision Process (MDP) without defining the reward function, and a set of demonstrations generated by humans/experts. However, in practice, it may be unreasonable to assume that human behaviors can be explained by one reward function since they may be inherently inconsistent. Also, demonstration...
متن کاملToward Probabilistic Safety Bounds for Robot Learning from Demonstration
Learning from demonstration is a popular method for teaching robots new skills. However, little work has looked at how to measure safety in the context of learning from demonstrations. We discuss three different types of safety problems that are important for robot learning from human demonstrations: (1) using demonstrations to evaluate the safety of a robot’s current policy, (2) using demonstr...
متن کاملImitation and Reinforcement Learning from Failed Demonstrations
Current work in robotic imitation learning uses successful demonstrations of a task performed by a human teacher to initialize a robot controller. Given a reward function, this learned controller can then be improved using techniques derived from reinforcement learning. We instead use failed attempts, which may be more plentiful, to initialize our controller and, taking them as illustrations of...
متن کامل